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ABSTRACT 



Design and implementation of recursive digital filters 
with fixed point arithmetic using special hardware are 
considered in detail and applied to a mechanization of a 
second order filter structure with variable coefficients. 

Two new methods of performing quantization after arith- 
metic operations within a digital filter are presented: 
quantization after addition and quantization before multi- 
plication. Both methods are shown applicable to hardware 
Implementation of digital filters and offer advantages over 
the usual quantization after multiplication. Error bounds 
are derived for these two quantization schemes and compared 
with the results previously obtained by other authors. It 
is concluded that the quantization before multiplication is 
the most suitable for hardware filter implementation. A 
design modification of the presently available hardware 
chips in order to permit round-off or truncation before 
multiplication is presented. 
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I. INTRODUCTION 



A. IMPORTANCE AND APPLICATIONS OF DIGITAL FILTERS 

A digital filter (D.F.) is defined [29] as a computa- 
tional process or algorithm by which an input digital 
(discrete time and amplitude) signal or sequence of numbers 
is transformed into an output digital signal. 

A digital filter can be compared to an analog filter as 
illustrated in Figure 1-1. A signal source x(t) is fed into 
the two processors. If the output y*(t) looks like the 
output y^(t) for all x(t), the upper and lower signal 
channels must be equivalent and then the digital processor 
is an equivalent of the analog filter, but operating on a 
digital signal, x*(t), from the analog to digital converter 
(ADC). Therefore the digital processor can be called a 
digital filter. 

A digital filter can be implemented as a subroutine in 
a general purpose computer or as hardware in the form of a 
special purpose digital processor. In the hardware form, 
a D.F. is a collection of storage elements, adders and 
multipliers connected together in a prescribed way (filter 
structure), much as the continuous filter is an ordered 
connection of resistors, capacitors. Inductors and active 
gain elements. 
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FIGURE l-l ANALOG AND DIGITAL FILTER COMPARISON 



13 



The advantages of digital filters over their analog 
counterparts are numerous [31]. Some of the advantages are: 

a) arbitrarily high precision In the computational 
process j 

b) no parameter or component value drifting, 

c) flexibility in the processing procedure, which allows 
the construction of adaptive filters, 

d) no necessity for impedance matching, 

e) possibility to use time-sharing techniques, 

f) easy realization of complex circuits, 

g) high reliability, 

h) small circuit size, 

1) decreasing costs for mass-produced basic building blocks. 

The follov;ing are typical examples of the superiority of 
digital filters over similar analog filtei* types: (1) Linear 

phase filters can be implemented by digital filters having 
extremely fast roll-off with either narrow or wide passbands 
or stopbands, and do not introduce nonlinear phase shift in 
the passband. (2) Comb filters are particularly useful for 
isolating repetitive signals of a known frequency. For 
example, in sonar systems, signals must be isolated from 
noise or other unwanted signals. (3) The extremely critical 
tolerances on crossover amplitude and phase characteristics 
of filters operating on adjacent passbands can be mechanized 
within any specified accuracy without drift or component 
aging effects. These accuracy and drift problems are 
encountered in spectrum analyzers and synthesizers having 
applications in radar, sonar, communications, and channel 
selectors. (4) Speech analysis and synthesis sometimes 
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requires a nonlinear phase response because both the 
magnitude and phase characteristics must be detected. In 
addition, the need to vary the filter characteristics is a 
necessity and may be varied or programmed easily with 
digital filters. (5) Two-dimensional filtering is widely 
used in the areas of image and geological data processing. 

B. PREVIEW OP RESULTS 

Digital filter implementation has been confined primarily 
to computer programs for simulation or for processing rela- 
tively small amounts of data, usually not in real time. 
However, the rapid development of integrated-circuit tech- 
nology and specially large-scale-integration (LSI) is 
creating increasing interest in the hardware digital filter 
implem.entation . Mechanization hardware is discussed in 
Chapter II and its utilization in a digital filter design 
in Chapter III. 

The design of a D.F. can utilize methods which are 
similar to those used for analog filters. Pole-zero analysis 
is essentially the same in the Z-domain used for discrete 
systems as it is in the Laplace transform domain used for 
continuous systems. Appendix A presents the Z-transform 
and the mapping of the s-plane into the z-plane, and 
discusses the significance of the pole positions. The 
transfer function decomposition m.ethods of continuous systems 
are also easily applied to the Z-domain filter function and 
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result In the same filter forms, as shown in the discrete 

transfer function realization methods presented in Appendix 

B and in the functional transforms discussion in -Appendix C. 

An example of a D.F. design using a Z-transform technique and 

its hardware implementation are illustrated at the end of 

Chapter III. A complex application of the North American 

Rockwell building chips in the hardware design of a second 

T 

order section using a structure and permitting 

variable coefficients and word lengths is presented in 
detail in Chapter IV. 

Errors due to finite precision in the representation of 
numbers in a D.F. always occur. The quantization noise 
problem is particularly serious in recursive D.F. wherein 
the algorithm uses the results of previous calculations to 
generate present signal quantities. The fact that quantiza- 
tion errors are fed back can cause limit cycle oscillation. 

In Chapter V tv;o new quantization methods are presented: 
quantization after addition (QAA) and quantization before 
multiplication (QBM). The former has been barely studied 
in the literature and the latter is not even mentioned. 

For the second order filter, using fixed point arithmetic, 
quantization bounds are derived for QAA and for QBM and 
compared with the results obtained by Yakowitz and S.R. 

Parker [20-32] for the case of quantization after multiplica- 
tion (QAM). This study concludes that the bounds for QBM 
can be at most as large as the bounds for QAA and shows that 
the bounds for QBM are larger or equal to the bounds for QAA. 
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In Appendix D, using Lyaponov’s direct method, a quantization 
bound for QAA in a tv;o pole, no zero filter, is determined 
and compared with a value calculated in a previous work by 
Parker and Hess [1]. The result now obtained is half as 
large. Some other advantages of using QBM or QAA in 
hardware filter implementation are mentioned in the same 
chapter and a modification to the present hardware building 
chips is included in order to permit roundoff or truncation 
before multiplication in the implemented filter structure, 
otherwise restricted to truncation after multiplication. 
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II. DIGITAL CONSIDERATIONS 



A. INTRODUCTION 

A digital filter (D.F.) can be constructed from a small 
set of relatively simple digital circuits, primarily shift 
registers and adders, weel suited for large-scale integration 
(LSI) technology. 

In this chapter the advantages of serial, two's comple- 
ment binary arithmetic in the Implementation of digital 
filters are discussed. The required shifting and arithmetic 
operations are described. Particularly, the serial/parallel 
multiplier and its circuits are studied in detail. The 
effect of sampling an analog signal is shown and a brief 
description of simple analog-to-digltal and dlgltal-to-analog 
converter circuits is also Included. 

B. TWO'S COMPLEMENT NOTATION 

The 2's complement of a binary number is formed by 
simply subtracting each digit (bit) of the number from 1 
and adding a one to the least significant bit (LSB) . Two's 
complement coding of a digital number is used when both 
positive and negative numbers are to be represented. The 
two's complement of a number a, with N data bits, has the 
form 

^0 ^1 ^2 ^3 ‘ ' ^N 

where the bits a^ are either zero or one. 
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since only fractional numbers will be used, the value 
of a has magnitude less than one, then 

N -1 

a = - a^, + Z a. 2 
^ 1=1 

The bit a^ Is the sign bit and Is commonly separated 
from the other bits by a decimal point, as represented In 
Figure 2-1, and the bit a^^ Is the least significant bit 
(LSB). 

Positive numbers are coded In simple binary. Negative 
numbers are formed by taking the two’s complement of the 
corresponding positive numbers. 

1 . Serial Processing 

Serial processing of digital numbers Is obtained by 
entering the digital number into sequential circuits one 
bit at a time with the least significant bit first. Parallel 
processing is accomplished If all bits are entered simulta- 
neously. Gabel [30] has recently presented a parallel 
arithmetic structure for recursive digital filtering whose 
main advantage is a processing time independent of word 
length. Digital filters are generally serial machines 
since they present several advantages: 

(1) They can be Implemented using less and simpler hardware. 

(11) Carry-propagation delays found in parallel circuits 
are eliminated. 

(Ill) The delay operator z~^ of the digital filter is easily 
implemented with a single-input, single-output shift 
register. 

(iv) Serial processing aids appreciably In the Implementation 
of multiplexing schemes. 
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DIGITAL REGISTER 



SIGN Bl 
0 = + 



^[J»[ 



Ea 



1 1 I 1 1 



N DATA BITS 

FIGURE 2-1 FORMATTING OF THE BINARY NUMBER 



( 0 ) 

0.00 




FIGURE 2-2 THE CYCLIC nature of two's 

COMPLEMENT ADDITION 
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2 . Advantages of Two * s Complement Notation 

One advantage of two’s complement Is that formated 
data can be clocked into an arithmetic unit, with the least 
significant bit first, with no advance knowledge of the 
sign of the data [^]. Another advantage is associated with 
overflow in addition. Overflow in a digital filter occurs 
in the adder when the sum of the two numbers has a larger 
number of bits. Then the sum overflows into the sign bit. 

The output during overflov; will be in error, but using two’s 
complemented it can be recovered. If for Instance, more 
than two numbers are being added, some of the partial sums 
will overflow, but the final sum may not. 

The process of recovering an overflow is Illustrated in 
Figure 2-2 in which the values of the two’s complement 
number are arranged on a circle. Addition of positive 
numbers causes movement in the clockwise direction and that 
of negative numbers causes movement in the counter clockwise 
direction. Thus if positive overflow occurs the result will 
be a negative number and if negative overflow occurs the 
result will be positive. If +1/2 is added to +3/^j the 
result would be -3/^ due to overflow, but if a third number 
-1/2 were added, the result would be +3/^ which is correct. 
The same could be observed if one of the inputs has already 
overflowed from some previous operation. 

The range over xvhlch the two's complement unit may be 
considered linear is from -1 to (1 - 2”^'^) where 2”^ represents 
the least significant bit (LSB) and’ N the number of data bits 
in the number. 



21 



3. Number of Bits Required 



The binary representation of a decimal number can 
have a very large length. Therefore, the number of bits 
necessary for representing a decimal number with a known 
accuracy has to be determined. 

Let the decimal number 

^ -j 

X = E b. 10 

J=1 ^ 

scaled such that |x| < 1 , be known with an accuracy 
(x-Ax) < X < (x+Ax) where 

Ax = 10"^ 

and let the binary number (considering only the significant 
bits) 

^ -1 
y = E a. 2 ^ 

i=l ^ 

be the approxim.atlon of the decimal number, with an accuracy 
1 -M 

Ay = 2 - 2 . Since the accuracy of the binary number has to 

be at least as great as the accuracy of the decimal number, 
it follows that 

B > D log2 10 ~ 3. 32 D (2.1) 
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Therefore, the number of bits (sign bit excluded) necessary 
to represent in binary a decimal number (magnitude less 
than one) with an accuracy up to the decimal place. 

Is given by the first Integer bigger than the product 



3.32 X ^ = 13.28 . 



C. ARITHMETIC OPERATIONS 

The only operations which have to be considered for a 
digital filter implementation are: 

(1) Storage or shifting 

(11) Negation 

(111) Addition 
(Iv) Multiplication 

1 . Storage 

Digital information is stored in a two state 
device called a flip-flop, which can remember, or store, 
a binary bit of Information because of Its bistable 
characteristic. 

A shift register can be Implemented using two such 
flip-flops placed in series and gated alternately as shown 
In Figure 2-3. Placing N shift registers cells in series 
the output Is the Input delayed by N clock periods. 
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I 1 






FIGURE 2-4 TWO'S complement inverter 
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2 . Negation 



A very useful method of inverting a two's complement 
number using serial arithmetic is to complement every bit 
which passes after, but not including, the first "1". 

0 . 1010100 

I n T^e r t^e d ^ ^ ^ 

1 . 0101100 

The sequential circuit presented in Figure 2-4 uses 
the method previously described for the implementation of a 
tv;o's complement inverse. The input enters serially wlth 
the least significant bit (LSB) first with the Q output of 
the flip-flop initially cleared to zero. The bits pass 
unchanged through NAND gates 1 and 3. The first one will 
change the flip-flop state during the next clock pulse, thus 
all succeeding bits pass through the inverter and NAND gates 
2 and 3. The clear pulse resets the flip-flop after the 
number has passed. 

3. Serial Addition 

Serial digital adders have three Inputs (2 data and 
1 carry) and two outputs (1 sum and 1 carry) as shown in 
Figure 2-5 j and can be summarized by the truth Table II-l. 
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INPUTS 


OUTPUTS 


A 


B 


c 


1 


2 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


1 


0 


1 


0 


0 


1 


1 


0 


1 


1 


0 


0 


1 


0 


1 


0 


1 


0 


1 


1 


1 


0 


0 


1 


1 


1 


1 


1 


1 



TABLE II- 1 

TRUTH TABLE FOR SERIAL ADDER 



From this truth table the following logic equations 
can be obtained 

SUM = OUTPUTl = AB*C* + A*BC + A*B»C + ABC 
= A(B’C H BC) + A'(BC* +B'C) 

CARRY = 0UTPUT2 = A’BC + AB*C + ABC* + ABC 

= BC + A(B’C + BC* ) 

Figure 2-6 shows the logic implementation of the 
above equations. 

In Figure 2-7(a) is shown a circuit used to implement 
two's complement addition Involving one full adder and one 
flip-flop, which acts as the delay element. An inverter is 
used in the carry circuit of the standard full adder 
integrated circuit. 
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INPUT A 



^OUTPUT I 



INPUT B — J> 

5 ” 



INPUT C 
(CARRY) 




DELAY 



^OUTPUT 2 
(CARRY) 



FIGURE 2- 5 SERIAL ADDER 




FIGURE 2-6 SERIAL ADDER LOGIC 
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(b) TIMING DIAGRAM 
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To illustrate the operation of this circuit, an 
example of the addition of two numbers in two’s complement 
notation will be performed. 

A 1.0111101 (- 67 / 128 ) 

B 0.0110001 ( + il9/128) 

A + B 1.1101110 (-9/64) 

The corresponding timing diagram of this addition 
is shown in Figure 2-7(b). Assuming that the transfer 
Information takes place when the clock changes from zero 
to one (positive going edge), it can be observed that during 
each clock period the full adder adds the bits A, B and 
corresponding to that time and produces the sum T. and the 
carry output this one will be delayed by one clock 

period so that it will appear at the input during the 
next time period. A clear pulse will zero the carry during 
the first time period. 

The time difference between the time the input bit 
enters and the time at which the output bit appears is 
called the "propagation delay" of the adder. The propagation 
delay to the sum output is usually larger than that of the 
carry output . 

In order to avoid synchronization errors, flip-flops 
are generally necessary between adder stages to keep the 
data in synchronization. 
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. Multiplication 



Multiplication Is the most complex and the most time 
consuming arithmetic operation required In digital filters. 
Normal binary multiplication Is performed by successive 
additions and shifting, which process Is controlled by the 
multiplier bits: If a 1, the multiplicand Is added to the 

sum of partial product; If a 0, no addition Is performed. 

Since the filtering process must operate synchron- 
ously, the multiplication must be of fixed tlm.e duration. 

In addition to the speed considerations the amount and the 
complexity of the hardware required to perform multiplication 
Is also Important. Considering these factors, the serial/ 
parallel multiplier (SPM) , In which a serial data Is multi- 
plied by a parallel coefflcent word, has been used almost 
exclusively . 

The serlal/parallel multiplier (SPM) accepts an M-blt 
serial multiplier and an N-blt paralled multiplicand Input. 
Figure 2-8 shows a basic SPM, where a^ represents the most 
significant bit (MSB) and a^^ the least significant bit (LSB) . 
The multiplier enters serially on the line "m" with the LSB 
appearing first. The number of adders In this SPM depends 
on the number of bits of the multiplicand. N-1 full adders 
are required for a N bit multiplicand. If a 1-blt appears 
on the multiplier serial Input line, m, the stored multipli- 
cand Is gated to the adders through the AND-gates and the 
first partial product Is generated. Each Individual sum at 
each adder Is then delayed 1-blt time and Input to the next 
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FIGURE 2' 8 BASIC serial / parallel multiplier 



adder. The carry from each adder is stored In the flip- 
flop which provides 1-bit delay so that the carry is fed back 
into the adders during the next clock time. If a "0" bit 
appears on the multiplier, causes all zeros to be sent to 
the adder and then the partial product will also be all 
zeros . 

The LSB of the product will appear at the sum output 
of the last adder during the first clock period and the 
MSB will appear at the output during clock time N+M. 

The modified version of the basic SPM shown in 
Figure 2-10 generally Increases the versatility of the 
device, since it has the capability of multiplying either 
positive or negative numbers represented in two's complement 
coding. 

The multiplication of a negative multiplicand with 
a positive multiplier is Illustrated in Figure 2-9a. As 
before a "1" in the multiplier causes the multiplicand to 
be shifted to the left, but due to the negative multiplicand, 
the multiplicand sign-bit must be spread to perform the 
required correction. Thus the multiplier being "1", and 
the multiplicand negative (MSB is 1) 1' s must be spread to 
the left of the MSB of the partial products. The multiplier 
being "0", the partial product will be all zeros, and O's 
will spread to the left. 

The multiplication of a positive multiplicand with 
a negative multiplier is Illustrated in Figure 2-9b. In 
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1.1 0 0 1 1 
0.0 1011 



(-) MULTIPLICAND 



1.1 111110011 

1.1 111100110 
0.0 000000000 

1.1 110011000 
0.0 000000000 
0.0000000000 

1.1 101110001 

(a) Two's complement multiplication of 
(+11 x 2"5) (-13x2“^) = -Iil3x2“^^ 



0.0 1101 

1.1 0101 (-) MULTIPLIER 

0.0 000001101 
0.0 000000000 
0.0 000110100 
0.0 000000000 
0.0 011010000 

1.1 001100000 
1. 11011100 0 1 

(b) Two's complement multiplication of 
(-11x2“^) (+13x2"^) = -Iil3x2"^° 



Figure 2-9 ' 
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this case an ordinary multiplication will be performed 
except for the multiplier sign bit. The partial product of 
the multiplier sign bit has to be complemented, or since 
in this case the MSB of the multiplier is "1", the two's 
complement of the multiplicand is added instead to achieve 
the required correction. 

In Figure 2-10 the network at the extreme left 
involving one AND-gate, one OR-gate and one type T flip- 
flop, acts as the sign spreader of the multiplicand as 
required. T is a single pulse, one clock period in length, 
which occurs at the time in which the sign bit of the multi- 
plicand appears at the input. Therefore only the sign bit 
of the multiplicand is gated to the flip-flop. If the 
multiplicand is positive, the sign bit v;ill be zero and 
this circuit will take no action. If the multiplicand is 
negative, the sign bit will be one and the T flip-flop, 
which was previously set to zero state by T^, vflll change 
to one state and hold for the rest of the multiplication 
process. Therefore, the sign of the multiplicand will be 
spread. The time signal T^ is a single pulse occurring at 
the time the sign bit of the product appears at the output 
and its function is .clear all flip-flop before the next 
multiplication . 

Tq is a single pulse occurlng during the first time 
period of the multiplication process. The OR-gate in the 
carry circuit of the first adder and this time signal, Tq , 
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FIGURE2-10 4-BIT SERIAL / PARALLEL MULTIPLIER 






are used to subtract the multiplicand as required when the 
multiplier Is negative. If the multiplier Is positive, a^ 
will be zero. Taking the 4-blt SPM of Figure 2-10, then 
point 5 will always be zero. The Inversion after the delay 
will make point 7 one and Its sum with 11 (which Is one 
since Tq at the Input of the OR-gate Is one at the first 
time period of the multiplication process), will generate 
a carry one at point 11. Therefore the output 12 of the 
first adder represents only the A input of the adder. 

If the multiplier is negative, point 5 will depend 
on the existing multiplicand serial input bit during each 
time period. This circuit operates as two's complement 
subtracter for the multiplicand when the multiplier is 
negative . 

The operations of the sign-spreader and the subtracter 
perform the corrective measure which enables the SPM to 
perform positive, negative and mixed multiplication. 

An additional delay flip-flop included in the sum 
output of the last adder besides compensation for propagation 
delay, provides an extra delay required when two's complement 
multiplication is performed. V/hen a N-blt number if multi- 
plied by a M-blt number the resulting product has M+N+2 
bits, but only M+N bits have magnitude information. The 
remaining 2 bits will indicate the sign of the product. The 
redundant sign bit can be eliminated by truncation. 
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In order to illustrate the operation of the SPM of 
Figure 2-10 the following example with a negative multipli- 
cand and positive multiplier is used. 

1.1 0 1 1 0 Multiplicand A = -5/16 

0.1 1 0 Multiplier B = + 3/^ 

000000000 

111101100 

111011000 

000000000 

1. 11000100 Product AB = -15/64 

A timing chart for this multiplication is presented 
in Figure 2-11, which shows the states of each circuit point 
labeled in Figure 2-10 for each time period. 

This multiplier can be expanded to accept any length 
serial multiplicand and parallel multiplier numbers [4], 
however the timing signals must be changed accordingly so 
that they occur in proper correspondnece with the serial 
input number and the product. 

In a digital filter the multiplier numbers are the 
coefficients of the filter transfer function. If a fixed 
filter is used, the coefficient will remain unchanged and 
the multiplier bits can be hard wired. However if the 
coefficients are variables, external switches may be set to 
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Figure 2-11. Timing chart of a two’s complement 
multiplication with multiplicand 
-5/16 and multiplier +3/^ 
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realize a particular filter - this Is generally the case 
when laboratory units, or read-only-memory (ROM) are used - 
which Is advantageous when the filter Is to be multiplexed. 

The advantages of using this two's complement serial/ 
parallel multiplier for digital filter Is now evident. 

There Is only a N+1 bit delay (number of bits parallel 
Input) and the multiplication process takes only M+N+2 time 
periods to be completed, but since the redundant sign bit 
can be truncated a word length of M+N+1 bits can be used. 

This type of multiplier using flip-flop between the full 
adders, eliminates greatly propagation delay problems. 

D. SAMPLING 

The sampling rate required for a sampler Is determined 
by the analog Input signal. If the Input signal Is periodic 
with period T, the minimum sampling rate which Is called 
the "Nyqulst rate" Is 1/2T samples per second according 
to the sam.pllng theorem. 

Because of the effect of sampling, the original data 
spectrum Is scaled and repeated across the entire spectrum. 

If the signal Is sampled at a rate less than the Nyqulst 
rate, or In other words. If the spectrum of the Input signal 
Is limited between ±w /2 , a distortion due to the overlaplng 

o 

side bands will occur, as observed In Figure 2-12b. This 
effect Is called "folding" or "aliasing". Since the Infor- 
mation lost by folding can not be recovered, care should be 
taken In the design of a digital filter. A practical limit 
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Narrowband Signal Before Sampling 




Figure 2-l2a. Narrowband Signal After Sampling 




Wideband Signal Before Sampling 




Figure 2-12b. Wideband Signal After Sampling 
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of ±w /5 for the spectrum of the input signal has been 
s 

found at the Naval Electronic Laboratory Center [13]. 

Therefore, digital filter applications are more suited for 
narrow band signals. 

E. CONVERSION 

1 . Analog to Digital Conversion 

The analog to digital converter (ADC) generates a 
digital number which is proportional to the amplitude of 
each pulse from the sampler by comparing the amplitude of 
input with some reference, which is generally generated by 
a digital to analog converter (DAC) , as shown in Figure 2-13. 
The parallel inputs to the D/A come from an up/down counter 
which seeks a zero error at the comparator input. In order 
to hold the input constant during the conversion process 
it is necessary to precede the ADC by a sample/hold circuit, 
which holds the level sampled until the next sample is made. 
Since most ADC’s have parallel outputs, as the one described, 
conversion must be made to a serial number, using a parallel-in 
serial-out shift register, before entering the digital filter. 

2 . Digital to Analog Conversion 

The D/A conversion is generally a simpler process 
than the A/D conversion. The basic digital-to analog con- 
verter produces a certain output voltage for each different 
digital input. This is commonly done as shown in Figure 2-l4, 
using a resistor network with one resistor connected to each 
bit of the input digital number. The resistor values are 
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Current Surnr.lng v;ith an Operational 
Ar.plifler to Obtain a Digltal-to-Analog 
Conversion 




weighted to be proportional to the value of each corres- 
ponding Input bit. The resulting currents are then summed 
using an operational amplifier to produce a level which Is 
proportional to the value of the Input digital number. 
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III. DIGITAL IMPLEMENTATION. HARDWARE DESIGN CONSIDERATION 



A. INTRODUCTION 

The realization of a digital filter involves three main 
synthesis steps: 

(i) Approximating the ideal filter transfer function by 
classical means and apply a convenient Z-transform technique 
[12]; an optimization algorithm to minimize, for example, 

a square error criterion in the frequency domain [26]; 
or any other direction design method to obtain a discrete 
filter which satisfies the given specifications. 

(ii) Quantizing the multiplier coefficient of the filter 
in the appropriate cascade, parallel or hybrid form in such 
way to minimize cost and complexity, while still satisfying 
the filter specifications. 

(iii) Selecting a specific configuration for the digital 
filter, specifying the v;ord length used and the arithmetic 
mode (only fixed point is being considered in this work), 
the quantization type (round off or truncation) and where 
in the circuit will be effective (generally after multipli- 
cation) , so as to satisfy the specifications relating to 
quantization noise. 

B. QUANTIZATION EFFECTS 

When a D.F. is implemented with special purpose hardvrare 
(or on a computer) errors and constraints due to finite word 
length are unavoidable. This quantization effects must be 



considered, both In deciding what word length (or register 
length) Is needed for a given filter Implementation and in 
choosing between several possible Implementations of the 
same filter design, which will be affected differently by 
quantization. 

There are four main errors due to quantization effects 

(i) Input quantization producing A/D conversion errors, 

(ii) Arithmetic quantization generating noise by the roundoff 
or truncation of quantities after arithmetic operations, 

(ill) Quantization of the filter coefficient producing a 
pole-zero displacement, and (iv) Constraints on signal levels 
imposed by the need of preventing overflow. The effects of 
these errors and constraints will vary depending upon the 
arithmetic used. 

Weinstein and Oppenheim [22] have shown that floating 
point arithmetic is generally less noisy than fixed point 
arithmetic and it is known that floating point provides greater 
dynamic range. Fixed point mode is much easier to implement, 
and its error analysis is much less involved, therefore it is 
the one more often addressed in the literature. A discussion 
and bibliography of the literature concerning this error 
effects appears in [18-23-24]. The analysis of quantiza- 
tion noise due to roundoff after multiplication has been 
studied by stochastic [5-6] and deterministic methods 
[1-7-8-9], assuming uncorrelated noise sources. Under the 
general assumption of correlated noise sources a stochastic 
method has been studied by S.R. Parker, and P. Girard [25]. 
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Mitra and Sherwood [21] have proposed a technique for 
estimation of pole zero displacement due to coefficient 
quantization in fixed point arithmetic. E. Avenhaus [27] 
has presented a method to find canonical structures which 
minimize the coefficient sensitivity due to rounding errors 
when small coefficient word length is used. Knowles and 
Olcayto [ 19 ] have indicated a method of analysis of the 
response of a D.F. affected by the coefficient accuracy 
using a "stray" transfer function in parallel with the 
corresponding ideal filter, but this method is not suitable 
for cascade realizations. 

C. WORD LENGTH REQUIREMENTS 

When a filter is constructed with digital hardware, the 
minimum, word lengths needed for specified performance accu- 
racy must be determined. This is one of the most Important 
and difficult decisions in a digital design. 

Figure 3-1 visualizes the relationship between the word 
lengths (number of bits in the number, sign bit excluded): 
in the input word (C), in the serial word being processed 
within the arithmetic unit (M) and in the multiplier coeffi- 
cients (N). When the sign bit is included, these word 
lengths will be represented by C, M' and N', respectively. 

1 . Input Data Wordlength (C) 

The input word length is the word length of the data 
out of the A/D converter. Therefore, it is related mainly 
to the input quantization error in the sampling A/D conversion 



MULTIPLIER 

COEFFICIENTS 




FIGURE 3-1 WORD LENGTHS IN A 
DIGITAL FILTER 
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process and determines the granularity or the number of 
levels of quantization required of the A/D converter. 

The size of the quantization step used, h, depends 
principally on the dynamic range and on the granularity of 
the A/D converter. The dynamic range Is the. ratio between 
the largest signal or saturation level and the 

smallest signal detectable or threshold level 

Considering only the dynamic range dependence, the 
quantization step 



h 






th 



must be equal to the LSB v:lth an accuracy of C significant 
bits, or 



h = 



2 



-C 



therefore 



® “ l° 82 <X 3 a^/Xj^) (3.1) 

Considering only the granularity of the A/D conver- 
sion, and assuming an additive white noise Is Introduced 
at the converter, resulting In a noise figure F, expressed 
In dB, the following equation can be obtained [33* 



(3.2) 



^ ^ F - IMoSio ol 
201°SlO 2 

2 

v;here a represents the mean square level of the signal, 
s 

As a design criterion, the signal may be assumed to have a 
Gaussian amplitude distribution with a standard deviation 
of 1/3, and then from equations (3*1) and (3.2) will result 
in 



C = C+1 = max{[l + log. 



‘sat-, ^ 101°Sio 3,., 
V J 5 L on T o J 



'th 



20 log^Q 2- 



(3.3) 



2 . Computational Data Word Length (M) 

As mentioned previously, the arithmetic quantization 
noise is unavoidable and may be very significant in a D.F. 
and all the methods of analysis available presently are 
quite complex. Pettweis [17] has observed that round 
off (or truncation) noise depends only on the v;ord length 
(M) at the input of the D.F., therefore M-C extra bits 
(all zeros initially) are appended to the A/D converter 
output . 

The serial/parallel multiplier described later can 
handle any word length (M), however, if the coefficient word 
length (N) remains the same, the sampling rate and then the 
speed of the process will be reduced, as indicated by equation 
(3.8). Also the number of the shift registers used in the 
hardware filter implementation will increase as M increases 
as v/111 be shown later. 
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3. Multiplier VJord Length (N) 



The multiplier coefficient length Is associated 
with the accuracy with which the poles and zeros may be 
placed, or In other words, the tolerances of the filter 
design. 

Multipliers with low sensitivity can be Implemented 
with fewer bits, hence yielding a circuit with potentially 
lower cost and higher speed. Since first and second order 
sections are the building blocks being used, only the results 
of the coefficient accuracy applied to this case will be 
presented. 

According to [3] a first order filter with a pole 
or zero (s+a) with a tolerance of ±Aa, requires a corres- 
ponding multiplier word length 

N > log2 [2e““'^ a*^ (3-^) 

and for a second order filter, with complex conjugate pair 
poles at s = -Cq - Jwq with a characteristic equation is 
the z plane given by 

1 + az~^ + bz~^ = (z - z^)(z - Z 2 ) = 0 



where 



a = -2rcos 0 
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b = 



z 



1.2 



r = 



e 






Arg 2 “ ® ^ “o'^ 

For the tolerances of ±Ao and ±Aw the word length of 
the coefficient multipliers has to be: 
for a : 



N > - log. 






(3.5) 



for b : 



N > - log 2 [il OqT e ° (3.6) 

As will be observed later the num.ber of serial/parallel 
multipliers used will depend on this word length (N). 

D. GAIN SCALING 

Overflow occurs when a D.F. computes a number that is 
too large to be represented in the arithmetic used in the 
filter. If no compensation is made for the overflow, then 
large errors in the filter output will result. 

Several techniques are used to compensate or to avoid 
overflow. One method is to detect overflow and then compen- 
sate for it imjTiediately after it occurs. If a positive 
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overflow Is detected, a large negative number is injected 
into the filter and if a negative overflow is detected, a 
large positive number is injected. The overflov; will then 
be compensated due to the cyclic. nature of 2 ' s complement 
arithmetic, and no error will occur. Another method is 
saturation arithmetic v;here a sum that is too large to be 
represented is set equal to the largest representable 
number in the filter. The output will be in error, but 
it will avoid overflow oscillations. 

The most common method of preventing overflow is the 
process of scaling. The simplest form of scaling is effec- 
tively to reduce the size of the input signal. However, if 
the analog input is reduced, the slgnal-to-noise ratio v;ill 
usually be decreased. Therefore, it is usually more desirable 
to reduce the digital input signal with a scaler between 
the A/D converter and the filter input. This scaler can be 
a shift register which effectively divides by powers of two 
or a multiplier whose coefficient is less than one. This 
last approach v;ill be the one used. In fact, all second 
order filter sections will be preceded by a scaling multi- 
plier (K) that will be set Just low enough to prevent over- 
flov; at any adder. Thereby, linearity is assured while 
maximizing the dynamic range of each section and consequently 
of the filter. This is achieved by seeking a value of K 
such that for all the possible digital filter inputs, X(z), 
the output of each adder, Y^(z), v.’ill satisfy 
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< 1 



( 3 . 7 ) 






z, 



=exp(Jo)T) 



max 



E. TIMING 

Timing is another requirement in digital filter design, 
since sequential circuits are used. The ’’filter word” 



length (number of time periods required to process one 
input word before the next v;ord may be entered) has to be 
determined. Mathematically the filter word length corres- 



sired D.F. transfer function. As will be shown in the 
examples presented later, the filter word length is a func- 
tion of the multiplication time and it is generally given 
as (M’ + N*) bits, where M’ and N' are respectively the 
number if bits used to represent the computational data 
word and the scaling coefficient in the multiplier (sign 
bit included). Then, the maximum word rate (sampling rate) 
at which the filter can operate is 



where fg is the bit rate, determined by the system clock 
rate, and (M' + N') is generally referred to as the word 
time . 



ponds to the delay operator z ^ which appears in the de- 



f, 



W M’ + N' 



M + N + 2 



( 3 . 8 ) 
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F. HARDWARE DESIGN 



The following discussion on hardware Implementation will 
be restricted to MOS/LSI^ technology. Two types of MOS/LSI 
chips developed by the North American Rockwell Microelectronic 
Company (NRMEC) will be presented and a design method of 
second order filter sections will be introduced. This method 
will be illustrated with a low pass digital filter example 
using a z-transform technique. 

1 . The Devices 

The North American Rockwell Microelectronics Company 
(NRMEC) has developed two LSI processing devices to operate 
on two's complement formatted serial digital data and LSI 
compatible analog-to-dlgital and digital-to-analog converters. 
Table III-l presents the characteristics of this MOS/LSI 
digital filter building block. Filters may be configured 
using this device over the frequency range of 0 to 20 KHz. 

The serial/parallel multiplier (SPM) and the shift 
register adder (SRA) are the processing devices. This MOS/LSI 
device utilizes p-channel enhancement mode transistors. A 
four phase clock scheme is required to perform both the SPM 
and the SRA. 

a. Serial/Parallel Multiplier (SPM) 

One SPM chip forms the sign-corrected product of 
an input data word of any length and a scaling coefficient 



^MOS technology refers to a device with three layers: 
metal-oxlde-semlconductor . LSI means large-scale Integration 
process . 



Characteristics 


SPM 


SRA 


A/D-D/A 


Size (in mils) 


142 X 136 


180 X 216 


180 X 180 


Frequency (MHz) 


1.5 


1.0 


1.0 


Power Dissipation 
(in mw at 1 MHz) 


35 max 


200 max 


75 max 


Output Drive Capability 


100 pf 


50 pf 


100 pf 


Voltage (clock, input, 
supply 


-30V max 


-30V max 


-30V max 


Number of Devices (MOSFETS) 


640 


1250 


1800 


Mechanized terms 


322 


410 


11 bit 


Number of Pins (flat pack) 


42 


42 


42 



Table III-l. Characteristics of LSI digital filter devices 
from North American Rockwell Microelectronics 
Company 
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of length up to 8 bits plus sign. Longer coefficient 
multiplications can be performed by cascading SPM chips. 

The scaling coefficient (multiplicand) can be loaded in 
parallel or serial and transferred to parallel holding 
register. Generally in digital filters applications the 

I 

scaling coefficient is input serially at SIl, least signif- 
icant bit (LSB) first, by changing the TRS input from "0" 
to ”1" one bit after inputting the sign bit, as observed 
in the timing diagram of Figure 3-3. The serial word 
(multiplier) is Inputted LSB first into Mil input and input 
TSS should be taken to a "1" for one bit at the same time 
as the sign bit appears on the Mil input. The TMR signal 
being "1" clears the adders and sign bit circuitry and holds 
the output to ”0". The LSB of the multiplier should be 
inputted 2 bits after this TMR signal. 

From Figure 3-2 can be observed that the LSB 
of the product appears at the output (SOI or S02) one bit 
after the LSB of the multiplier input signal enters the Mil 
input. For an N' bit coefficient multiplicand, the 
multiplication process will produce a delay of N* bits at 
the SOI output. In Figure 3-3 a 9 bit delay between the 
sign bit of the multiplier input and the product output is 
observed for the 9 bit (8 + sign) scaling coefficient 
(multiplicand) used. 

The multiplier performs proper sign connection 
only if the inputs (data and scaling coefficients) have 
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Figure 3-2. Block Diagram of 65001NA Serial/Parallel Multiplier 
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Figure 3-3. Timing Diagram for 8-Bit-Plus-Slgn Multiplier and 
Multiplicand in Minimum-Time Cyclic Operation 



magnitudes both greater than unity. This potential problem 
can generally be solved In a practical mechanization as 
will be shown. 

b. Shift Register Adder (SRA) 

As shown In the block logic diagram of Figure 3-^ 
and In the simplified functional diagram of Figure 3-5 a SRA 
consists of two Identical 7 to 15 bit shlft-and-hold registers, 
two ^-Input adders and a timing and control circuitry. 

Each adder exhibits a one-bit time delay. One 
of the adders Is able to Inhibit two Inputs If the Input 
CNI Is made "1". Both adders are reset by a "1" on control 
Inputs TRl and TC21. 

The register section Is able of adjust In length 
to accommodate the length of the data word In the computa- 
tional loop, by coding the Inputs A, B and C. A shift 
register longer than 15 bits Is obtained by cascading these 
register sections. Particular, a delay up to 30 bits can 
be obtained cascading the two sections of a single SRA chip. 

The timing and control section provides the proper 
timing signals not only to the SRA but also to the multipliers 
that may be associated with that SRA. The timing signals T^ 
and T 2 are the only required timing Inputs. 

2 . Canonic Realization of Second Order Sections 

Given a linear time Invariant system It Is shown In 
Appendix B that Its transfer function can be expressed as a 
parallel, cascade or hybrid realization of first and second 
order transfer function sections. 
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Block Logic Diagram of 65OO7NA/B 
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Figure 3-5. Simplified Logic Organisation of Shift Register/Adder 



The canonic form is the one generally used to 
realize second order sections, since minimizing the number 
of operations (particularly multiplications) corresponds to 
a minimum number of noise error sources due to quantization 
(round-off or truncation) within the D.F. 

P, Girard [25] extending a previous work by Parker and 
Hess [2], has shown that from the state equations and 
associated transfer functions 

x(n) = A x(n-l) + B u(n-l) 
v(n) = C x(n-l) + d u(n-l) 

Hg(z) = d + ^ (3.9) 

1 + az-^ + bz"'^ 

3 c(n) = A x(n-l) + B' u(n-l) 
v(n) = x^^^ u(n-l) 

H„(z) = d' + (3.10) 

1 +az +bz 



there are 36 canonic realizations for d = 1, 36 for d = 0, 
22 for d' = 1 and 22 for d' = 0. 

The most general form of the transfer function of a 
second order filter can be expressed as 



H(z) 



V(z) 

UTFJ 



K 



1 + a^^ z ^ + b^^ z ^ 
1 + a z + b z 



(3.11) 



62 



from which eq. (3-9) can be obtained by dividing the 
denominator into the numerator in ascending powers of z~^. 
Equation (3*10) can also be obtained from eq. (3.11)j if 
b 7 ^ 0, by dividing the denominator into the numerator in 
descending powers of z”^. 

Only poles and zeros within the unit circle (in the 
z plane) will be considered, since it corresponds to minimum 
phase stable filters. Therefore the magnitude of the 
coefficients "b^" and "b" are less than unity and the 
magnitude of the coefficients "a^^" and "a" are less than two. 

Equation (3.11) is easily mechanized in the S form 

3 

[2], also called form [25], as shown in Figure 3-6. 

z ^ is the unity delay operator and the multiplier gains are 
the coefficients K, ^ 

Mq sets the scaling coefficient (K) 

M, sets a/2, which affects the resonant frequency of the 
pole . 

M 2 sets b, which affects the damping of the pole. 

sets a^/2, v;hlch affects the frequency of the zero, 
sets b^ , which affects the depth of notch of the zeros. 

Since a and a^ can be as large as two, the multipliers 
and are set at half value but summed twice at the 
adders. This will assure that the multipliers v;ill perform 
the proper sign connection since all inputs will be less than 
unity. 

This configuration is capable of realizing real and 
complex pairs of poles and zeros within the unit circle. 
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FIGURE 3-6 RECURSIVE CANONICAL REALIZATION OF A 
SECOND ORDER FILTER SECTION ON SM,, FORM 




FIGURES"? DISTRIBUTION OF GAINS AND DELAYS ON THE 
SECOND ORDER LOW PASS FILTER EXAMPLE 
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3. Example of a Low Pass Digital Filter Design 

Assuming that a digital filter for a 10 KHz rate is 
required such that It Is flat to 3 dB in the passband of 
0 to 1,000 Hz and which is more than 10 dB down at frequencies 
beyond 2,000 Hz. The filter must also be monotonlc In 
passband and stopband. 



above requlrem.ents in the analog domain and taking advantage 
of the knowledge of the analog design, the use of a transform 
technique seems convenient. The bilinear transform will be 
used, because it is the most applicable for constant magni- 
tude passband and stopband, as mentioned in Appendix B. 

But since the bilinear z-transform distorts the frequency 
response, a counter warp will be used on the design of the 
analog filter substituting each critical frequency by 
(2/T) tan (co^ T/2) . 



Observing that a Butterworth filter can meet the 



Since 



T = 1/f , = 1/(10 KHz) 
w 



then, each counter v;arped critical frequency will be 



= (2/T) tan 



(2tt) (1 KHz) 
2 (10 KHz) 



(2/T) (.32^19) 



= (2/T) tan 



(2tt) (2 KHz) 
2 (10 KHz) 



(2/T)(.7265) 
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The cut off frequency is specified by the 3 dB point; 
then, in this case 



= (2/T)(.3249) 



Applying the Butterworth analog design method 



(Vp/V)^ = 1 + 



where for a low pass filter x = w and x^ “ w and is 

^ 3 ClB c p 

the peak amplitude 

V is the amplitude at a given point x 
n is the order of the filter 



Since V /V^ = 10 dB then /Vp) = 10 and the order of 
the filter can be obtained from 



1 + 



&r-~ 



10 



giving n = 2 



Then 



H(w) = 



1 + (w/w 1 + (cj/o) 



and 



H(s) = 



(s/w^)^ + l.iQiUs/u)^) + 1 
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Replacing s by (2/T) 



and since = (2/T) (. 32^9) , yields the required transfer 
function In the z-domaln 



H(z) = -0^75569(2^ + 2z + 1) 

z^ - 1. 1^1216 z + .ill241l 



which can be written in the form of equation (3.11) 



H' (z) = K 



-1 -2 

l + 2z-^+z^ 



1 - 1 . 1^1216 z ^ + .ill2ilil z ^ 



where 



a = -I.lil2l6 
b = .^112^14 




and K is the scaling factor necessary to avoid overflow. 



iDenominator 1 min 
K < 

iNumerator I max 



1 - lal + b 
1 + la^l + bj^ 



1 - |l.l422 I + .^112^1 

— = .06755 

1 + l2 1 + 1 
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Using the mechanization shown in Figure 3-6 it can 
be observed that with the multiplier coefficients previously 
calculated, the multipliers M3 and M4 are not necessary. 
Therefore a realization of the type presented in Figure 3-7 
will be attempted. The timing distribution calculation will 
give the required delays (D^,D 2 >D 2 and Dj^) to the shift 
registers . 

Assuming the same accuracy in all multiplier coeffi- 
cients, each multiplier will present N' - bit delay and each 
adder 1 - bit delay. For a computational word length M', a 
restriction is given by equation (3*8). From this equation 
since the chips can not operate at a bit rate higher than 
1 MHz and a sampling rate of 10 KHz is required, then the 
word time M' +N' must be less than 100. 

Since the data at © must be in word synchronization 
with 0. but delayed one word time 

1 + D1 + N' = M' + N' then D1 = M' - 1 



and similarly with the data at 




and 




D1 + D2 - M' + N» 



then D2 = N' + 1 



The data at © has to be delayed tv;o word times from the 
data at © and in word synchronization with it 



D1 + D2 + D3 + N» = 2(M» + N») then D3 = M' - 1 
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Finally, comparing the data at with the data at we 

can obtain 



D3 + D^l = M* + N» then = N* + 1 

For a precision of 5 decimals on the coefficients 
of the multipliers, the use of equation (2.1) will Indicate 
the need of 17 bits. One SPM chip will permit only a 
coefficient up to 8-blt-plus sign. Two SPM chips will 
permit up to l6-blt plus sign (N' = 17 bits). Each 
multiplication will be realized cascading two SPM, and 
therefore six SPM chips will be required. 

The computational word length (M’) has to be larger 
than the v/ord length out of the A/D converter and should be 
made large enough to compensate for truncation errors In 
the filter computation. Choosing M' = 30 bits and recalling 
that each SRA chip provides tv;o separate shift registers 
capable of delaying up to 15 bits, It can be concluded from 
the timing calculations made previously that four SRA chips 
are required, since Dl, D2 , D3 and need 29, l8, 29 and 
l8-blt delays, respectively. 

However, a better solution can be achieved using 
only t\J 0 SRA chips and an extra multiplier (M3) . This 
multiplier Is set with a fixed coefficient of minus one In 
order to permit two additions and tv/o subtractions at the 
output of the SRA, as shown In Figure 3-8. Therefore, 

D2 = N’ + 1 bit delays are obtained with N' - bit of the 
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MO (I) (I) 




FIGURE 3-8 BLOCK DIAGRAM OF A SECOND ORDER 

LOW PASS FILTER IMPLEMENTATION 
SHOWING TIMING DISTRIBUTION. 
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multiplication process plus one bit delay available from 
the previous shift register, which uses M* - 1 bit delay. 

In order to obtain = N' +1 bit delays, the shift register 
of the multiplier M3 is used giving N’ bit delays and as 
before one bit is available from the previous shift 
register (D3 = M* - 1). 

For the chosen word lengths M’ = 30 bits and N' = 17 
bits, only four SPM and two SRA chips will be required, 
rather than three SPM and four SRA. 



71 



IV. DESIGN OF A SECOND ORDER DIGITAL FILTER SECTION 
USING THE SM^^ TRANSPOSE FORM 

A. INTRODUCTION 

T 

A second order building section in the form 

(transpose of has been designed able fo perform with 

the digital filter laboratory unit built by S.A. V/hlte from 
the North American Rockwell Electronics Group. 

In order to permit the same parameter variations, the 
designed section is capable of a computational word length 
(M*) from l6 to 30 bits and multiplier coefficient (N') 

12, lit or 17. The length of both these v;ords as mentioned 
previously, affect the accuracy and the speed of the digital 
filter. The clock frequency is variable between 25 KHz and 
1 MHz. The filter sampling rate is related to the previous 
variables by the equation (3.8). 

The second order building block implements the following 
expression 

l + a.|Z^ + b,z^ 

Y(z) = K ^ — X,(z) + Xp(z) - X^(z) - Xm(z) 

1 + az-^ + bz^ ^ 

+ X^(z) - Xg(z) - Xj(z) 

(it.l) 

The following state equation 
jc(n) = A x(n-l) + B u(n-l) 
v(n) = C x(n-l) + d u(n-l) 



72 



for a single Input single output second order filter leading 
to the S type transfer function Indicated In equation (3.9) 
can be written In the form 



x^(n) " 








x^(n-l) 


^2 (n) 


= 


3x3 

array 




X2 (n-1) 


- v(n) _ 




— _ 




. u(n-l) - 



P. Girard [25] has introduced the canonical arrays, 
which corresponds to the idea of canonical realization 
given by Jackson [4]. 

The transpose array has the following form 




i^.3) 



This Is a canonical array since its realization minimizes 
the number of operations required, therefore leading to 
smaller quantization errors. This realization satisfied 
equation (^.2) for the canonical array (^.3) and the defined 
state vector x(n), as shown in Figure ^-1. 

The coefficients a^ and b^ are related with the ones of 
the transfer function (3.9): a^ = a + c and b^ = b + e. For 

this realization d = 1. 
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FIGURE4-1 CANONIC REALIZATION OF A SECOND ORDER 
SECTION BASED UPON THE SMn TRANSPOSE 
ARRAY 
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B. STRUCTURE MECHANIZATION 



The design will be restricted to stable minimum phase 
filters. Stability implies poles within the unit circle 
in the Z-plane or in a parameter plane |a| <2 and |b| <1. 
Minimum phase implies zeros within the unit circle or 
|a-|^| < 2 and lb^| < 1. Since for proper multiplier operation, 
the magnitude of the coefficient has to be less than one, 
some arrangement has to be made. In the multipliers M2 and 
M4 the coefficient introduced will be respectively a^/2 and 
a/2, but as observed in Figure H-2 the second half of the 
adder number one, Al(2), will sum twice the output coming 
from the first half of the shift register, SR(1), which is 
delaying the resulting information not only from M^ and M^ 
but also from M^ and M^. Therefore the coefficient of this 
last multiplier v;ill be set at b^/2 and b/2 , respectively. 

The block diagram mechanization presented in Figure 4-2, 
minimizes the number of devices required to perform a SM^^ 
transpose form realization for the required specifications. 

The truncation processed in the D.F. is generally 
represented after each multiplication, however the NRI-lEC 
chips perform the truncation at the input of each adder. 

No problem will occur if the realization is of the SM^^ form 
as shown previously by Figure 3-6. However in a transpose 
realization the scaling coefficient multiplier, MO, is 
cascade with other multipliers. The truncation could be 
simply realized with an AND-gate controlled by a signal 
composed by a string of ones M' bits long. The first half 
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FIGURE 4-2 bloc diagram of a second order filter 

MECHANIZATION IN THE FORM SHOWING TIMING DISTRIBUTION 
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of the adder number one, Al(l), has been utilized instead, 
since from the three SRA chips needed, only five adders were 



used. Al(l) also provides the necessary bit delay to obtain 



first half of the adder numbers two A2(l). The two adders 
of chip num.ber three A3(l) and A3(2), facilitate the inter- 
connection of other filter sections in parralel. 

Multiplier M5 with a fixed scaling coefficient- of -1, 
has been introduced in order to provide a N’ - bit delay to 
the signal coming out of A2(2). Since the shift register 
of M5 is free, due to its fixed coefficient, it v/ill be used 
to delay the synchronization signal N' - bit. 

C. SHIFT REGISTER TIMING 

The next step towards the implementation of this filter 
section is to determine the timing requirements. For a 
computational word length of M' bits and a multiplier coeffi 
dent of N’ bits, correspond a multiplier output of 
(M' + N* ) bits, therefore a word time z~^ = (M' + N*) bits 
is established. As before, each multiplier will be treated 
as presenting an effective delay of N* bit times, and that 
each adder will produce a one bit time delay. 

The delay provided by the shift register SR(1) has to be 
such that the data at are in v.^ord synchronization v;ith, 

but delayed one word time from the data at . Then, 

1-blt delay at Al(l) plus N*-bit delay at M2 plus 1-blt 
delay at A2(l) plus the delay at SR(1) as to be equal to 
one word time, (M’ + N*) bits, or- 



the synchronization 




at the 
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1 + N* + 1 + delay SR(1) = M' + N' 



then 



delay SR(1) = M' - 2 



Similarly, the delay provided by SR(2) has to be such 
that the data at © are in word, synchronization with, but 
delayed one viord time from the data at Starting from 

the signal at 

N* + 1 + N’ + delay SR(2) = N' + (W + N’ ) 



then 



delay SR(2) = M’ - 1 



Since the computational word length M’ can be as large 

as 30 bits, one entire SRA chip or two halves are rea.ulred 

for each delay SR(1) and SR(2). 

Next, it is necessary to verify that the data and 

entering A2(2) are in word synchronization. In fact 

starting from © , via Ml, a delay of 1 + N’ is obtained 

at © and via M3 the same delay is obtained at ^1^ . 

From Figure ^-2, it can be observed that the output 

presents a delay of (N* + 1) bits with respect to the input, 

Thus, for a synchronization input signal T^, the corre- 

N’ + 1 

spending synchro output is d , v^here d represents 

one bit delay time. 

Figure ’^-3 presents the wiring diagram of this filter 
section. The small numbers Inside each box represent the 
pin number of the MOS chips. The multipliers are used in 
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FIGURE 4-3 ASSEMBLY WIRING DIAGRAM OF A SECOND ORDER FILTER SECTION 
IMPLEMENTED IN THE SM,|T FORM WITH NRMELC BUILDING CHIPS 



pairs to obtain the required coefficient accuracy 
(N' up to 17-blts). Then MO becomes MOl and M02 , etc. All 
multiplier shift registers are wired in series for serial 
loading of the multiplier coefficients. The scaling coeffi- 
cient word is read into this shift register cyclically. 

The box marked T in the shift register adders represents 
the timing sections of these devices. Each T-section 
provides the proper timing signals not only to the. proper 

SRA but also to the associated multipliers. SRAl - T 

N ’ 

receives as Inputs the signals T^ and T^d and since an 

N’ +1 

output with 1-blt delay is required, T^d , SRAl with 

type B pin configuration has to be used. For similar 
reasons, SRA2 will also be type B. 

D. TIMING DIAGRAM 

In order to Illustrate the processing of the signal 
through the filter and obtain a timing diagram, the maximum 
word lengths for the computational loop (M' = 30) and for 
the multiplier coefficients (N' = 17) vjlll be assumed and 
without loss of generality an input data signal of 15-blt 
plus sign will be considered. 

The timing at the points marked with circled numbers in 
Figure 4-2 is illustrated in Figure 4-4. The data enters 
the scaling multiplier MO, at © at word time I, with the 
LSB input first and the sign bit l6-bits later. This data 
is represented shaded so that the propagation of that word 
through the filter can be traced by following the shaded data. 
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The data at © are represented by a longer data word 
than the one at © because the multiplier generates a double- 
precision product (15 + l6 + 1 = 32-bit) and delayed N' = 

17 bit. Then the data out of MO is longer than the 
computational word length. The truncation to 30 - bit will 
occur at the input of the adder, Al(l). The reset signal 
for this adder, shown at the line RES Al(l) of Figure k-k , 
is off only for 30-bit, eliminating the two first .bits 
being inputed to the adder. The data through Al(l) will be 
delayed 1-bit and as indicated at © will be 30-bit long, 
inputing the multiplier Ml and M2. 

The data at © and (^8^, after weighted by the multipliers 
Ml and M2 respectively, v;ill be M + N + 1 = M'+N'-l = 

30 + 17 - 1 = ^6 bit long and delayed N’ = 17-bit from the 



multiplier input data at 



© 



©• 



The data at ( 4 ) must be in word synchronization with the 
data at inputing A2(2). The required truncation to 

30-bit is operated at the input adder. The reset signal, 

RES A2(2), has to be T^ delayed 38 bits or in general 



T^d 



?N * + 4 

. The data at the output of this adder (5) is 
then 30 - bit long and 1 - bit delayed from its inputs. 

The data at due to the multiplication process will 

be again 46-bit long and delayed 17-bit from the input at 
©• The shift register SR(2), implemented with the second 
half of the SRA’s numbers one and two, as shov.'n in Figure 
4-3, will delay the data by M' - 1 = 29 - bit. 



82 



The data at © must be in word synchronization with, 

and delayed one word time from the data at © and 3 

inputing A2(l). The truncation of these data Inputs are 

truncated to the computational word length at the input of 

2N' + 

this adder. A reset signal T^d = T^d'^ is required. 

The data at © passes through the shift register SR(1) 
so that its output at (l^ will be M' - 2 = 28 - bit delayed 



from 



©• 



The data at { 1 ^ has to be in word synchronization v;ith 
the data at © , inputing Al(2). Here, the truncation will 
affect only the data © since the data (l^ resulting from 
delaying the output of an adder conserves the com.putat ional 
word length. 

The data output of this filter section at ^1^ can be 
added with six more data Inputs provided fron other filter 
sections for a parallel realization or cascaded with 
identical sections for a series realization. 



E. DESIGN OF A SHIFT REGISTER CONTROLLED BY 

THE COEFFICIENT VJORD LENGTH 

As seen previously a reset signal delaying T^ by (2N' +4) 
bit is required for both adders A2(l) and A2(2). Since all 
multipliers are capable of control N', the shift register- 
part of M5, can be used, because its coefficient (minus one) 
is fixed. In Figure 5-3 the output pin 3 of M52 provides 
a signal T^ delayed N' - bit. Unfortunately, no other 
multiplier shift register is available to obtain a shift 
register controllable by the coefficient v.’ord length. 



83 



Figure ^-5a shows the wiring connections to a third 
shift register which can delay a signal by (N' + 2) bit 
delay. Figure 4-5b presents the design of a diode matrix 
able to control the length coding of that shift register. 

The coefficient word length (W) can have the values 12, 
li< and 17 . If 12-bit has been chosen all shift register 
input length coding will be zeros, and a 7-bit delay is 
obtained at each one, resulting in an output 1^1 bit delayed. 
If a l4-blt coefficient word length is chosen, the multiplier 
selector sv/ltch set at li<, will put ”1" on line B2 , all 
other inputs remaining "0"'s and then SR(2) will produce 9 
bit delay resulting in an output l6-blt delayed. If the 
multiplier selector sv/ltch is set at 17, lines B1 , CCl and 
B2 will go ”1", then a 10-bit delay will be produced SR3(1) 
and a 9-bit delay at SR3(2), resulting in an output 19-bit 
delayed. 

The shift register used (SRA 3) is package type A, 
since type B having a different pins connection, will not 
permit the proper code combination. 



F. MULTIPLIER TIMING SIGNALS 

The sign bit timing, TSS, is a one bit signal which 
goes ”1" at the sam.e time as the sign bit of the data appears 
at the multiplier serial input. Then, for the multiplier 
MO, TSS 0 appears at the l6th bit time as the sign bit at 
© , and cyclically one word length (M' + N’ = 47-blt) 
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FIGURE 4-5 

(a) SHIFT REGISTER (TYPE A) CONNECTION TO 
OBTAIN (N> +2) BIT DELAY 

(b) COEFFICIENT WORD LENGTH DIODE MATRIX 
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later. Similarly the signals TSS 1,2, 3, 4 for the multiplier 
Ml, M2, M3 and M4 and the signal TSS 5 for the multiplier 
M5. 

The timing signal TMR is a one bit signal which goes 
"1" two bit time before the LSB of the data appears at the 
multiplier serial input. The multiplication starts at that 
time. Then, TMR 0 appears at the ^6th bit time, two bits 
before the LSB appears at Similarly, TMR 1,2,3,^ for 

Ml, M2, M3, M4 and TMR 5 for M5. 

The multiplicand transfer signal, TRS, transfers the 
serial multiplier coefficient input to a larallel register, 
after the v/hole signal be inputed. Then TRS goes "1" for 
one bit, one bit after the sign bit of the multiplier coeffi- 
cient be inputed. Then, TRS 0 appears at the 33th bit 
time, one bit later than the sign bit of COEFP MO. Simi- 
larly, TRS 1,2,3,^ with respect to COEPF Ml, 2, 3,^. The 
multiplier M5 does not need the TRS signal since its 
coefficient (-1) is fixed. 

Although not represented in Figure 4-3, all data and 
synchronization filter outputs should have a buffer circuit 
to perform a convenient output isolation. The design of 
this buffer circuits and other controls however applicable 
to this design are not Included, since they are referred to 
in [28]. 
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V. QU ANTIZATION AFTER ADDITION AND QUANTIZATION 
BEFORE MULTIPLICATION. ERROR BOUNDS 



A. INTRODUCTION 

When a digital filter is implemented, errors due to 
finite precision in the representation of the numbers always 
occurs. The word length after a multiplier or an adder is 
in general larger than the original word length. The case 
of increasing word length after an adder which results in 
"overflov;" can be avoided by proper scaling at the input of 
the filter, as shown before. Therefore only the case of 
increasing word length after multiplication will be treated. 

Up to now, the realization of D.F. has been done almost 
exclusively using special purpose computers. Thus in order 
to reduce storage, quantization is performed exactly when 
the number of bits is increasing, such as after multiplica- 
tion. Almost all of the literature has been dedicated to 
the case of quantization after multiplication, either using 
a stochastic approach [5-6-20] or a deterministic one [1]. 

For hardware implementation of D.F.'s, for instance 
using the SRA (shift register adders) and SPM (serial parallel 
multiplier) chips from NRMEC, it is possible to maintain the 
resulting M' + N' - 1 bits after a multiplication of a N' 
bit multiplier times a M' bit multiplicand (sign bits in- 
cluded) until after next addition, because two consecutive 
multiplications will not occur (otherwise a single one would 
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suffice). It is possible to go even further, by carrying 
the M' + N’ - 1 bits until the next multiplication will 
be performed. This leads to two new methods of performing 
the quantization. Namely, quantization after addition (QAA) 
and quantization before multiplication (QBM). 

QAA only recently has been addressed [10-11], and shown 
that for the case of magnitude truncation, a second order 
D.F. has almost no limit cycles. QBM has not even been 
mentioned in the literature before. 

It can be observed that for hardware Implementation of D.F. , 
using for instance NRMEC chips, the filter word length and 

the storage of the devices for the cases QAA and QBM are 
exactly the same as when used with QAM (quantization after 
multiplication). For this last case the adder would be 
active for M' bits (wordlength of the computational loop 
in the filter) and off for the remaining N' bits of filter 
wordlength (z~^ = M' + N* bits). However for QAA or QBM, 
the adder will be active for the M' + N’ - 1 bits from the 
previous multiplication. 

B. ADVANTAGES OF QAA AND QBM 

It will be proved later that QAA will produce no larger 
quantization error bound than QAM, and that the error bound 
for QBM is smaller or equal to the QAA. In Appendix C, 
Lyapunov’s direct method is applied to find the amplitude 
bound of the limit cycles in the second order D.F. assuming 
QAA. The result obtained is tv;o times smaller than that 
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determined by Parker and Hess [1] for the case of QAM. 

Another advantage of using QAA or QBM over QAM Is shown 
next. 

In Chapter III it was mentioned that the magnitude of 
the multiplier coefficient has to be less than one In order 
to allow a proper operation of the SPM. From the examples 
presented In Chapter III and IV, it has been observed that 
whenever the magnitude of the multiplier coefficient is as 
large as two, as is commion practice, one can Introduce one 
half of the multiplier coefficient and sum twice the multi- 
plier output at the next adder, as shown in Figure 5-la. 

If finite arithmetic is now considered the output quan- 
tization errors for QAM and for QAA will be different. 
Consider, for instance, an input signal v/eighted by a coeffi- 
cient (|a| _< 2) and that rounding with a quantization step 
of h being used. For QAM, the maximum errors introduced 
after multiplication v;lll be Ie^| = h/2 and, since the output 
of the multiplier is added twice at the adder as shown in 
Figure 5-lb, the maximum output errors will be h. For QAA, 
as shown in Figure 5-lc, the maximum magnitude output error 
will be h/2. Therefore two times smaller than for QBM. 

C. HARDV/ARE MODIFICATIONS TO PERFORM QBM 

According to the reasons presented earlier, a hardware 
design able to perform QBM seems convenient. The MRMEC 
chips described in Chapter III could only perform truncation 
before each addition, which is equivalent to QAM (truncation) 
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+ 




(b) 




h/2 

h/2 



(c) 



Figure 5-1. Advantage of QAA over QAM V/hen the 

Magnitude oT the Coefficient Multiplier 
is Larger than One . Shov;n for |a| <2. 
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if two multipliers are not cascade. However the NRMEC 
chips can easily be modified so 'that they are able to perform 
truncation or rounding before multiplication. 

1 . Serial/parallel Multiplier Performing Truncation 

or Rounding Before Multiplication 

One way to obtain QBM, using truncation or rounding 
as desired, is to precede each SPM with a circuit as shown 
in Figure 5-2. It consists of one full adder and 2 flip- 
flop's acting as delay elements. An inverter is used in 
the carry circuit of the standard full adder Integrated 
circuit . 

Another way is to design a new SPM with the circuit 
described above Included within the chips, as shown in Figure 
5-3* Since the present SPM chip has 3^ pads and it is 
mounted in a 42-lead pack, the three new Inputs required 
(t, MI2 and r) can easily be placed in the available package 
pins. 

The operation of the circuit presented in Figure 5-3 
can be described as follows. Due to a previous multiplica- 
tion the input to a multiplier can be as large as M* + N’ - 1 
bit, where M' and N* represent, respectively, the number of 
bits of the computational loop within the filter and the 
number of bits of the coefficient multiplier (sign bit in- 
cluded). At the beginning of the present multiplication 
this data input can not be larger than the computational 
word length (M* ) , in order that no more than M' + N* - 1 
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Figure 5-2. Two's Complement Truncation/Rounding Circuit 




Figure 5-3- Kodified SPM to Perform Truncation or 
Rounding Before Kultlplicatlon 
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bits appear at the output produced to avoid overlapping with 
the next v;ord. Then truncation or rounding is required at 
the data input at MI2 to reduce this information to M' bits 
or, in other words, to eliminate up to N’~l bits. If trun- 
cation is desired the input ”r" is grounded and the input 
"t" will receive a signal as shown in Figure 5-^; a string 
of ones M’ bit long starting M' bits prior to the sign bit 
be input at MI2. The output of AND gate number 2 will be 
always zero, and the AND gate number 1 v/ill eliminate any 
information until "t" goes "1". Then, for a M' + N' - 1 
bit input, the first N’~l bits will be suppressed, and the 
Information entering the SPM will have N' bits and will be 
1-blt delayed by the adder. 

If the input data already has M' bits or less, no 
bit will be eliminated using input MI2 , but the 1-blt delay 
at the adder will exist. In order to eliminate the delay 
in this case, the input Mil has been made available. 

If rounding is required, both the signals "t" and 
»'r” will be present. The rounding signal, *'r’, is a 1-blt 
signal which goes ”1" M’ bit prior to the sign bit of the 
data Inputed at MI2. This signal will appear at the input 
of the gate 2 at the same time as the m.ost significant bit 
(MSB) of the information being eliminated. This v;lll be the 
only information passing gate 2. The output of gate 1, will 
truncate the input to M* bit as before, but now the previous 
MSB will be added to the LSB of the. M' bit information. Thus 
a rounded M’ bit data will input the SPM. 
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(M* +N' - 1) bit 



MULTIPLIER 

INPUT(MI2) 



- 

MSS of the information to ge eliminated 




■ V* 

(N* - 1) bit M' bit 



ROUNDING 
SIGNAL 1 



TRUNCATING 

SIGNAL 



INPUT 




Figure 5-^. Timing Signals for the Modified SPM 
Shov;n in Figure 5-3 
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2 . SRA Circuitry For Quantization Before Multiplication 



The shift register adder chip itself requires no 
alteration for the QBM operation. Only the reset signal 
going to the adders must be modified. As shown in the timing 
diagrams. Figures 3-3 and 4-^, this reset signal was ”0" 
during the M’ bit prior to the input of the sign bit of the 
data being added, and "1” for the remaining N’ bit time. 
Therefore the addition process was performed only during the 
last M’ bit. For the QBM operation, the adder has to be 
active during (M’ + N’ - 1) bits. Then the reset signal 
has to be "0" during the (M' + N' - 1) bit information 
entering the adder or, in other v;ords, it will be a 1-bit 
signal going to one the next bit after the sign bits of the 
data are inputed to the adder. 

D. ERROR BOUNDS DUE TO FINITE PRECISION ARITHMETIC IN D.F.'S, 
Using the state space formulation of a second order 
digital filter, the difference between the states and outputs 
of a finite fixed point arithmetic D.F. and its infinite 
precision (ideal) counterpart is derived for the nev/ quan- 
tization methods (QAA and QBM) Introduced earlier. The QAA 
bound derivation follows a similar path used by S.R. Parker 
and Yakowitz [32] on their quantization after multiplication 

V 

study. A different approach is required to compare QAA 
with QBM. Rounding is assumed with quantization step 
±h/2. 
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1. Quantization After Addition (QAA) 



The state equations for an Ideal (Infinite precision) 
single-input single-output second order D.F. , can be 
expressed as follows: 



1 

X 









11 



'21 



v(n) = [c^ 



'12 



'22 









1 

i—i 

^ ,0 






+ 






X 2 (n- 1 ) 




1 

C\J 

JQ 

1 



Cp] 



x^(n-l) 

X2(n-1) 



+ d 



u(n-l ) 

(5.1) 

u(n-l) 



or In vector notation 



x(n) = A x(n-l) + B u(n-l) 
v(n) = C x(n-l) + d u(n-l) 



(5.2) 



Assuming quantization after addition (QAA) for the 
finite precision D.P., as shown In Figure 5-5, the following 
state equations apply: 



x*^(n) 

x*2(i^) 

v*(n) 



[an 


x*(n-l) 


1 1 

C\J 
cd 
1 1 


* 


x^ (n-1) 


[0, 


» 


x^(n-l) 



+ a^2 X2(n-1) 
* 

+ a22 X2(n-1) 
* 

+ C 2 X2(n-1) 



+ b-j^ u(n-l) 

+ b 2 u(n-l) ]q 

+ d u(n-l)]q 
(5.3) 
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Quantization After Addition 



where * indicates signals in the finite precision filter. 



Or, in vector notation. 



x*(n) = [A x*(n-l) + B u(n-l)]q 
v*(n) = [C 21 *(n-l) + d u(n-l)] 



( 5.^0 



where the input has been assumed quantized i.e., 
u*(n-l) = [u(n-D] . The output appears also quantized, 

M. « 

v*(n), so that it can used as input to a next second order 
stage . 

Define the error vectors £(n) and e(n), as follows 



e(n) = A x*(n-l) + B u(n-l) - [A x*(n-l) + B u(n-l)]^ 

(5.5) 

e(n) = £ 2L*(^^“1) d u(n-l) - [C x*(n-l) + d u(n-l)] 

Assuming rounding with a quantization step of ±h/2, 
the above error vectors are bounded 



|ej^(n) I £ h/2 k = 1,2 

I e(n) I £ h/2 



(5.6) 



where 



98 



k = 




if all elements in the Kth row of the 
D.F. array are 0 or 1 

otherwise 



Therefore, it is possible to find constant vectors 
e and e, whose elements are larger than the magnitude of 
the correspondent elements of e(n) and e(n). Then 



<e(n) > < e 
<e(n)> < e 



(5.7) 



Defining the state and output errors the same way 
as in [ 32 ] there results analogously 

y(n) = x(n) - x*(n) 



= A x(n-l) + B u(n-l) - [A x(n-l) + B u(n-l)] 

Si 

- A x*(n-l) + A x*(n-l) (5.8) 

= A y(n-l) + e(n) 



Av(n) = v(n) - v*(n) 



= £ x*(n-l) + d u(n-l) - [£ 2i*(^“l) ^ u(n-l)]q 

- C x*(n-l) + C x*(n-l) 
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Av(n) = C y(n-l) + e(n) 



(5.9) 



The error propagation equation (5.8) and the output 



error equation (5 . 9) have exactly the same form as the ones 
derived for rounding after multiplication in [ 32 ], and 
therefore lead to a state error magnitude vector 

^ I 

<y(n)> < I <A > e (5.1 

~ 1=0 ~ ~ 

and an output error magnitude bound 



The bounds on the errors for QAA as indicated by 
(5.11) are at most as large as the ones indicated for QAM. 



<Av(n)> < <C> <y(n-l)> + e 



(5.11) 



For example , a SM^^ array 



-a -b 1 



1 



0 0 



c 



e 



1 



4 



See Ref. [25] for the definition of canonical arrays. 
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for QAM: 



= 2 h /2 = h 

62 = 0 h /2 = 0 



e = 2 h /2 = h 



for QAA: 



= 1 h /2 = h /2 

62 = 0 h /2 = 0 

e = 1 h /2 = h /2 

Using equations (5.10) and (5.11), it can be con- 
cluded that the error magnitude bound for QAA is one-half 
the value in QAM in this example. 

2 . Quantization Before Multiplication 

In order to compare QAA with QBM another approach 
will be used. Define the following error vectors: 



e^(n-l) = u*(n-l) - [u*(n-l)]q 
e (n- 1 ) = x*(n-l) - [x*(n-l)] 

A m 



( 5 . 12 ) 
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and assuming that these errors are introduced before each 
multiplication process according to the value of the error 
control parameters 7^^ and 6 (where i,J = 1,2), 

as shown in Figure 5-6, the state equations of a finite 
precision D.F. can be written as 



x^(n) 




^11 


^12 




x^(n-l) 


* 

X2(n) 




_^21 


CM 




* 

X2(n-1) 



“ll’^ll ‘^12* ^12 ' 




e (n-1) 
^1 


“21*^21 “22*^22 




e (n-1) 

^ 2 -* 



* 

V (n) 





u*(n-l) - 


f 

I — 1 

1 — 1 
oa 

1 


\ 

CM 

1 




1 

CM 

CM 

ca 

1 



[=1 =s] 


1 

X 

1 

M 

1 




Yj-Cj] 


"e (n-1) 
""l 




* 






e (n-1) 

L X2 J 




_X2(n-l) _ 







+ 



* 

d u 



(n-1) 



6 d e^(n-l) 



or in vector notation 



x*(n) = A x*(n-l) - aA e^^Cn-l) + B u*(n-l) - ^ e^(n-l) 

(5.13) 

v*(n) = C x*(n-l) - yC e (n-1) + d u*(n-l) - 6d e (n-1) 

X. LI 

If quantization after addition (QAA) is to be 
considered, all error control parameters (a,3,Y,^) are set 
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Figure 5-6. Second Order Single-Input Single-Output Digital Filter. 
Quantization Before Multiplication 



equal to one. This is equivalent to Introduce the error 
after the delay operator, rather than after the addition, 
but the value of the error is not affected. 

If quantization before multiplication (QBM) is to 
be studied, then set 



“ij 



if a, . =1 
Ij 






if b^ = 1 



b^ 7^ 1 



if c^ = 1 



c^ / 1 



0 if d = 1 



d /I 



( 5 . 1^0 



Here, the input signal has not been assumed to be 
quantized, since the output signal is not generally quan- 
tized. Therefore, these stages can also be cascade. 

These error vectors are bounded, and assuming again 
rounding with a quantization step of ±h/2, it follows that 



e (n-1) 



h/2 

0 



if at least one of the coefficients 
is different from 0 

or 1 

otherwise 



e (n-1) 
^2 




If at least one of the coefficients 
(a^2>^22’*^2^ different from 0 
or 1 

otherw’ise 



e^(n-l) 



h/2 



0 



if at least one of the coefficients 
(b^,b 2 ,d) Is different from 0 or 1 

otherwise 



Then it is possible to find constant error vectors 

e_^ and e^ whose elements are larger than the magnitude of 

the corresponding elements of e (n-1) and e (n-1) , or 

u. 

I (n-1) I < e ^ 

and 



It can be observed that the value of this constant 
vector component depends on the existence of nonzero non-ones 
columns on the D.F. array, rather than on the rows. 
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k = 1,2 



(5.15) 



where 




V 



h 
k 2 



e 



u 



V. 



h 

2 



0 



V 



k 



k=l,2,3 



1 



if all elements in the Kth column 
of the D.F. array are 0 or 1 

otherwise 



Defining the state and the output errors as before, 
it results in 

y(n) = x(n) - x*(n) 

= A x(n-l) + B u*(n-l) 

- [A x*(n-l) - 0 ^ e (n-1) + B u*(n-l) - ^ e (n-1) ] 

X ~ u. 

= A y(n-l) + oA e (n-1) + ^ e (n-1) (5.l6) 

Av(n) = v(n) - v*(n) 

= C x(n-l) + d u*(n-l) 

- [C x(n-l) - e (n-1) + d u*(n-l) - 6d e (n-1)] 

= C y(n-l) + yC e (n-1) + 6d e (n-1) (5.17) 

Li 



106 



Assuming x(-l) = x*(-l) and e (-1) = e (-1) = 0, and using 

X C*. 

the propagation error equation ( 5 . 16 ), 

y( 0 ) = x(-l) - x*(-l) = 0 

y(l) = A y(0) + oA e (0) + M e, (0) = aA e^(0) + 3B e (0) 

X U X u, 

y(2) = A y(l) + oA e (1) + ^ e, (1) 

X u. 

= A oA e^(0) + oA e^(D + A ^ — ®u^^^ 



then 

y(n) 



n -1 

I 

1=0 

n 

Z 

1=0 



A^ ^ ^ l^gA £^('6) + SB e^(£)j 
A'^J^oA e^(n-£-l) + SB e^(n-£-l)] 



( 5 . 18 ) 

(5.19) 



and from equation (5.17) using (5.18) and (5.19) 
n 2 

Av(n) = C Z A^”'^~^raA e iZ) + SB e (£)'] + yC_ e (n- 1 ) + 6 d e (n- 1 ) 
1=0 ~ u j X u 

( 5 . 20 ) 

= C Z A'^faA e (n-£- 2 ) + SB e (n-£- 2 )l+ e (n- 1 ) 

~ Z =0 I ^ J ^ 



+ 6d e (n-1) 
u 



(5.21) 
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From equation (5.19) it follows that the state error 



magnitude vector Is 

^ I I 

<y(n)> = < Z aA e (n-£-l) + A"^ SB e (n-£-l)> 

- - u 

^ Z Z 

< I <A > <aA> e + <A > <6B> e (5.22) 

- 1=0 - — ^ 

and from equation (5.21) using (5.17) the output error 
magnitude bound can be obtained 

<Av(n)> _< <C> <y(n-l)> + <yC> £ + |6d| e (5.23) 

where the state error bound, <y(n-l)>, Is given by equation 

( 5 . 22 ). 

As observed previously, for QAA all error control 
parameters are equal to unity. Therefore for QAA, equation 
(5.22) and (5.23) reduces to 

<y(n)> _< <A^^^> e_ + <A^> <B> e (5.2^1) 

<Av(n)> £ <^> <y(n-l)> + <C> £ + |d| e (5.25) 

X 1-1 

For QBM, It holds that 
<aA> < <A> 
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<3B> < <B> 

<yC> _< <C> 

|6d| < |d| 

since <Q> is defined as the matrix formed by the absolute 
value of each element of the matrix Q. 

Therefore the bounds for QBM given by equations 
(5.22) and ( 5 . 23 ) are at least as large as the bounds given 
for QAA by equations (5.2A) and (5.25), respectively. 

E. CONCLUSIONS 

Quantization after addition and quantization before 
multiplication methods have been shown applicable to hard- 
ware implementation of digital filters. Advantages of 
these two methods over the usual quantization after multi- 
plication has been demonstrated and QBM proved to be the 
more effective to reduce error quantization bounds. There- 
fore QBM is the most suitable form for hardware implementa- 
tion of digital filters. The modification required to 
perform rounding or truncation before multiplication using 
the available NRr^C chips has been presented. 
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APPENDIX A 



POLE-ZERO CORRESPONDENCE IN S AND Z -DOMAIN 

1 . Definition of the Z-Transform 

Given a sequence (x(n)} the two-sided z-transform 

n— 00 

Is defined as 

A «> 

X(z) = Z[x(n)] = I x(n) z”^ . (A.l) 

n=-oo 

When x(n) = 0 for n < 0, the one-sided z-transform Is 
defined 



A “ 

X(z) = Z x(n) z” 
n=0 



(A. 2) 



From the relation to the Laplace-Fourler transform 



Is called the unit delay operator. 

2. Mapping S-Plane Into Z-Plane 

Breaking s and z Into real and Im.aginary parts. 



s = a + jw and z = a + jv 
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Since 



Ts To 
z = e = e 



JwT _ Tacos ojT . Tasln coT 
J - e + je 



a 



V 



(A. 4) 



V/hen, to = 0, then from (A. 4) 



V = 0 and 





> 1 


for 


a > 1 


e 


< 1 


for 


Q 

A 



For a pole at -«>, a = -«> , and from (A. 4) v = 0 and 
a = 0, then mapp onto the origin of the z plane. 

For Imaginary poles, a = 0, we have from (A. 4) 

2 2 

V = sin tot and a = cos tot or v + a =1, therefore the 
imaginary axis of the s plane maps on the unit circle of 
the z plane. 

Figure A-1 summarizes the m.apping of the s plane into 
the z plane. The left half s plane is mapped inside the 
unit circle (|z| = 1) in the z plane. The imaginary s 
plane is mapped onto |z| = 1. The right half s plane is 
mapped into the region |z| >1 . The left stripe/limited 
by half the sampling frequency (±to„/4) in the s plane maps 
to the right within |z| <1 region. The left stripes 
bounded by +w^/4 and +to_/2 or -w_/4 and -co„/2 in the s plane 
maps to the left within |z| <1 region. The point at 
infinity in the negative real s-plane is mapped into the 
z-plane origin, and the s-plane origin is mapped into the 
+1 point in the z-plane. It can be concluded that the 
farther the real component of the s-plane complex pole is 
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located from the imaginary axis, the closer the z-plane 
complex pole is to the origin, which means the faster the 
discrete output sequence will converge, i.e., the damping 
is more pronounced. 



-o 



unit 
ci rcle 




s-plane 



2-plane 



Figure A-1. Mapping s-Plane into z-Piane 
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APPENDIX B 



DISCRETE TRANSFER FUNCTION REALIZATION 

1 . Discrete Transfer Functions 

A linear time-invariant discrete-time filter Is described 
by the difference equation 

M N 

y(nT) = I a,x[(n-k)T] - I b.y[(n-k)T] (B.l) 

k=0 ^ k=l ^ 

which discrete output, y(nT), Is a linear combination of the 
past and present M Input samples and N output samples. 

The transfer function of this discrete system, similarly 
a^ for the continuation case Is defined 

G(z) = Iff} (B.2) 

and taking the z-transform of (B.l) and rearranging gives 

^ -k ■ 

G(z) = ° y (B.3) 

-k 

1 + I b, z 
k=l ^ 

The observation of this transfer function shows that It 
Is Identical to those obtained from the Laplace transform 
analysis of continuous systems described by linear constant 
coefficients, ordinary differential equations. The roots 
of the denominator of G(z) are called the poles of the dlscret 
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system, and the roots of the denominators are called the 
zeros. However, the discrete system Is stable In the sense 
that every bounded Input sequence yields a bounded output 
sequence If and only If the poles of G(z) lie within the 
unit circle In the z-plane . 

The frequency spectrum of the discrete system Is periodic 
In w with period 2it/T due to sampling, and this spectrum can 
be computed by letting z = exp(juT) In the transfer function. 

2 . Recursive Filter Realization 

If In the transfer function of a D.P. , all bj^ are zero, 
the filter has no feedback, as revealed by inspection of 
(B.l) or (B.3), and Is said to be of the nonrecursive or 
transversal type. 

If at least one bj^ and one aj^ value are nonzero, the 
filter is called recursive. 

The nonrecurslve filter has finite memory and can have 
excellent phase characteristics, but tends to require a large 
number of terms to obtain a relative sharp cut off [l6]. The 
recursive filter has an Infinite memory and tends to have 
fewer terms. Therefore sharp cut off filters are much easier 
to design using a recursive structure. The design method for 
this type of filter will be discussed later. 

A transfer function can be realized by direct form or by 
reduction to lower order form, generally first or second 
order sections In a cascade, parallel or hybrid structure. 
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a. Direct Realization 

From a given transfer function of a D.F. the 
difference equation (B.l) can be obtained, and performing 
the direct operations Implied by that equation the so called 
’’direct” realization Is obtained, as shovm In Figure B-1. 

Using an Intermediate variable w(n) such that 

N 

w(nT) = Z b.w[(n-k)T] + x(nT) 
k=l ^ 

equation (B.l) can be written 

M 

y(nT) = Z a,w[(n-k)T] (B.4) 

k=0 

The realization based upon (B.4) Is shown In Figure 
B-2 and Is called the ’’canonical” realization of the filter, 
since the num.ber of delays and multipliers Is minimized. 

b . Reduction to Lovfer Order Forms 

This form Is more convenient because lower order forms 
present not only a smaller coefficient sensitivity [16] but 
also a reduced quantization noise effect [18]. Thus, a 
higher order filter Is obtained by combining first and 
second order sections. 

(1) Cascade Realization 

By factoring the overall transfer function can 
be vjrltten associating zeros and poles In the form 

P 

H(z) = k^ + Z G. (z) (B.5) 

P 1=1 



115 



» • o 




Figure B-1. Direct Realization 




Figure B-2 . Canonical Realization 
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as illustrated in Figure B-3, where G^(z) represents the 
transfer function of the first or second order sections. 

(2) Parallel Realization 

By partial fraction expansion, a transfer 
function with simple poles can be written as the sum of the 
first and second order transfer functions, in the form 

c 

H(z) = k n G. (z) 
c i=i ^ 

as realized in Figure B-4. 

If the transfer function has multiple poles, 
higher order sections will be required. 

The parallel realization permits an easy scaling 
of the D.F. , but the obtaining of the transfer function and 
the zeros are not readily identifiable. 

(3) Hybrid Realization 

The hybrid form is a combination of parallel 
and cascade, as shown in Figure B-5 (a) and (b) the design 
to obtain the hybrid form is not as simple and should only 
be used v;hen the cascade form becomes too difficult to scale. 

3 . Nonrecursive Filter Realization 

The z-transform applied to a continuous filter 
transfer function can not be applied to nonrecursive filters, 
also called transversal filters. This type of filter is very 
useful, in particular, if a linear phase minimum phase or 
a prescribed magnitude characteristic is desired. 
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H(z) = K n G.(z) 

C i=l T 



Figure B-3. Cascade Realization of K(z) 




P 

P i = l 



Figure B-4. Parallel Realization of H(z) 
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H(z) = K^CK^ + K3 G^(z)G2(z) + K4G3 ( z ) G^ ( z ) ] 



(a) 




H(z) = K^CK^ + G^(z) + G2 (z),][K 3 + G3(z) + 64(7.)] 

(b) 



Figure B-5. Hybrid Realizations 



120 



a . Convolution Approach 



For a linear discrete system the following convolution 
summation applies 



M 

y(nT) = E x(mT) h[(n-m)T] (B.7) 

m=0 



where h[(n-m)T] Is the discrete Impulse response delayed mT. 

Prom equation (B.7) a discrete time transfer function 
Is obtained 



G(z) = = E h(£T) z (B.8) 

= h(0) + h(T) z“^ + h(2T) z“^ + ...(B.9) 

which leads to a nonrecurslve or transversal filter 
realization shovm In Figure B-6. 
b . Fourier Series Approach 

As mentioned before, a nonrecurslve filter has all 
bj^ equal to zero. Then from equation (B.3) 

oo , 

G(z) = E a, (B.IO) 

k=0 ^ 

letting M generlcally go to Infinity. Equations (B.8) and 
(B.IO) are equivalents. 

Due to sampling the frequency response of 
a discrete time filter Is periodic, with period equal to 
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H(z) = E A 2 
n=0 " 



Figure B“6. Block Diagram of a Non-Recursive 
or Transversal Filter 
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Its sampling frequency, w = 2 tt/T. This periodic frequency 

, o 

response may be represented as a Fourier series. The form 
of the series to be used will depend on whether the desired 
frequency characteristics are an odd or even function with 
respect to zero frequency. 

Even functions can be written In the form 

00 

G (joj) = Aq + E cos(wnT) (B.ll) 

^ n=l ^ 

and odd functions In the form 

oo 

G^(joj) = Z B sin(wnT) (B.12) 

n=l ^ 

Using the relation z = exp(jtoT), equations (B.ll) 
and equation (B.12) can be presented as (B.13) and (B.l4). 

00 

G^(jw) = A. + I (z" + z“"^) (B.13) 

® ^ n=l ■ 

G^(jw) = E ^ (z^ - z"") (B.lH) 

° n=l 

To obtain filters with real coefficients, the j of 
equation (B.1^1) can be dropped. The resulting filter will 
have a phase shift displacement of 90° from the theoretical 
function, but the magnitude function will not be affected. 

Figures B-7 and B-8 illustrate the block diagram 
realization of nonrecurslve filters for finite (since the 
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H(z) = 2‘" [Aj + 



" "n 



(z"+2'")] 



Figure B-7. Block Diagran 
Kechanizaticn 
Cosine Series 



of Transversal Filte 
for Finite Fourier 




H(z) = 2 



-N 



[ Z (z"-z‘")] 
n=l 



Figure B-8. Block Diagram of Transversal Filter 
Mechanization for Finite Fourier 
Sine Series 
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summation stops after N terms) Fourier cosine and sine 
series, respectively. 

3. Windowing 

In order to establish a physical realizable filter design, 
the summation in equations (B.8), (B.13) and (B.l 4 ) must 
stop after N terms. 

The effect of truncating the response from an infinite 
number of terms accounts to a distortion of the frequency 
response curve, called "GIBB's phenomenon”, which is vfhat 
normally happens when a Fourier series is truncated. 

This truncation is equivalent to multiplication by a 
window function, which is nonzero for a length of time 

NT, or in the frequency domain is equivalent to the convolu- 
tion G'(w) = G(w)*W(w). This accounts for the distortion 
in the frequency domain, but also helps to avoid it, if a 
proper window function is chosen. In general, a low pass 
filtering or smoothing of the magnitude response is obtained 
by the window function. 

The best known are the Haming and Hanning windov;s [ 14 ]. 

The Kaiser window [I6] is relatively easy to use and exhibits 
superior side lobe suppression and produces designs ivhich 
compare with others developed through more Involved proce- 
dures [3]. 
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APPENDIX C 



FUNCTIONAL TRANSFORMS 

There are three most common methods of mapping a trans- 
fer function from the s-domain to the z-domain : standard, 

bilinear and matched z transforms. 

The bilinear and matched are optimized for sine waves 
yielding the most accurate transform in the commvini cat ions 
field. A summary of each transform is presented next and 
a comparison table is shown in Figure C-1. 

The hand calculation of these transforms for more than 
a first-order one stage filter is extremely complex and 
requires a high level of accuracy. Therefore the use of a 
computer program [13] is helpful. 

1 . Standard z-Transform 

The standard or impulse invariant z-transform uses the 
transformation z = exp(sT). It requires the partial frac- 
tion expansion of the transfer function of the continuous 
filter. Therefore a sum of first order terms is obtained 
and the exponential transform Indicated on Figure C-1 is 
applied to each one, yielding a parallel realization. In 
general, this representation gives excellent results when 
applied to all-pole low-pass and bandpass filters [12]. 

The design of bandstop and high-pass filters can only be 
accomplished adding in cascade a v/ideband low-pass filter, 
called "guard filter" in order to eliminate folding. 
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2. Bilinear z-Transform 



The bilinear z-transform (trapezoidal integration) 
eliminates the folding problem of standard z-transforms , 
and is very useful to realize digital filters that have 
relative constant magnitude passband and stopband 
characteristics . 

This transformation 

s = (2/T) (l-z”^)/(l + z”^) 

is an algebraic one, so it can be applied to the factored 
or unfactored transfer function of the continuous filter. 

This mapping, however, distorts the frequency response. 
Therefore it is necessary to counter-wrap the desired 
radian frequency response before applying the transformation. 
Then each critical Imaginary frequency is replaced by 
2/T Tan(l/2w^T). This still does not yield an exact equi- 
valence between the two frequency responses, therefore care 
must be used when designing filters with critical frequencies 
near the half-sampling frequency. 

3. Matched z-Transform 

This transformation generates a digital transfer function 
with poles and zeros matched to those of the continuous func- 
tions. The exponential transformation s = exp(sT) is then 
applied to poles and zeros. It requires factoring both 
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numerator and denominator of the continuous transfer function 
to the form s - b and replaced 'oy 1 - exp(bT). Addi- 
tional zeros at half the sampling frequency may be required 
In order that the power of the poles and zeros are the 
same . 
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Table C-1. Comparison o£ the three types of z-trans£orms 
available to transform poles and zeros 
(transfer function) from the s-plane to the 
z-plane 
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AMPLITUDE BOUND OF LIMIT CYCLES IN 
D.F. USING LYAPUNOV’S DIRECT METHOD 

For the case of quantization after addition (QAA) a 
second-order digital filter section with two poles and no 
zeros will be studied similarly and compared with results 
obtained by Parker and Hess [1] for quantization after 
multiplication (QAM) . 

The system presented in Figure D-1 for QAM can be 
redrawn as shown in Figure D-2 considering roundoff after 
addition and described by the following difference equation 
(where u(n) = 0) 

x*(n) = [-a x*(n-l) - b x*(n-2)] (D.l) 

For a normalized quantization step (h=l) , this equation 
can be written as 

x*(n) = -a x*(n-l) - b x*(n-2) ± [.5 - <S(n)] (D.2) 

where 



0 < 6(n) < 1.0 
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> x*(n) 



U(n) 



GD- 






Figure D-1. Second Order D.F. with Two Poles Using 
Quantization After Multiplication 




Figure D-2. Second Order D.F. with Two Poles Using 
Qua.ntization After Addition 
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The roundoff noise sequence e(n) = .5 - <5(n) range 
between ±.5 and can be considered as driving function to 
the difference equation (D.2) for the study of the natural 
response of the system (zero Input, Initial condition only). 

Then the error source can be considered 'as an Input 

|u(n) I = |e(n) | < .5 



and using the state variables x^^*(n) = x*(n-2), X 2 (n) = x*(n-l), 
u(n) = e(n), equation (D.2) can be written as 



x*(n+l) = 



0 


1 




0 


-b 


-a 


X * ( n ) + 


1 



u(n) (D.3) 



or 



x*(n+l) = A x*(n) + B u(n) 



(D.it) 



The transfer function of this filter Is 

G(z) = ^ (D.5) 

1 + a z ^ + b z~ 

and Its characteristic equation Is 

1 + a z"^ + b z"^ = 0 (D.6) 
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The steady state frequency response Is obtained by 
setting z = e"'^ where T is sampling interval. 

Therefore , 

0=1+ a(cos wT - j sin wT) + b(cos 2wT - j sin 2wT) 

0 = 1 + a cos ojT + b cos 2wT - j sin (o!T)(a + 2b cos ojT) 

(D.7) 

This equation is satisfied if both real and imaginary 
parts are simultaneously satisfied, then 

i 

the imaginary part is zero when 

i) a + 2b cos coT = 0 cos cjT = 

2 a^ 

since cos 2o)T = 2(cos uT) - 1 = — ^ - 1 

2b 

the real part becomes 

0 = 1 + a(-^) + b(-^ - 1) = 1 - b (D.8) 

dD 2b“^ 

ii) sin toT = 0 T = Kti K = 0,1,2 ... 

the real part becomes 

0=1+ (-1)^ a + b (D.9) 
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Equation (D.8) and (D.9) are the stability boundaries 
for a so-called "linear'* second order D.F. 

The term linear means that overflow or saturation arith- 
metic which may occur for large signal amplitude is not 
considered, but only the nonlinearity characteristics of 
the quantizer. Therefore small signal amplitudes are 
assumed. 

Then a linear filter, as defined previously, has the 
stability boundaries 



1 - b = 0 
1 ± a + b = 0 



b = 1 



|a| = 1 + b 



(D.IO) 



Therefore, for b < 1 and la| < (1+b) the corresponding 

linear system is asymptotically stable in large (ASIL) . 

Since the input is also bounded for all n ^ 0, the 

theorem mentioned in the Appendix of [1] can be applied. 

It states that for a system described by the state equation 

}c(n + 1) = A x(n) + B u(n) , if the homogenous system is 

T 

ASIL and has a Lyapunov function V = :c Q x with 
AV = - x*^.C X and |u(n)| £ for all n ^ 0, then the 

system is stable and the states are certain to enter a 
region defined by | | x| I £ ^ 2 ^ where 
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(C.ll) 



"*2 



K 






X max(Q) 
X min(Q) 






l|A^ Q B||^ 

X^ min(C) 



I |a'^ Q b| I 

X mln(C) 



B Q B 



X mln(C) 



with 



X min(Q) = minimum eigenvalue of matrix Q; 

X max(Q) = maximum eigenvalue of matrix Q; 

m m 

I I A ^ ® I I “ norm of the matrix product A Q B 
defined as max a^. where a^^. are 
elements of Q B; 

I I x| I = norm of the state vector. 

T 

The Lyapunov function V = x Q x where Q is a real 
symmetric and positive definite matrix (RSPDM) can be 
found for any RSPDM C from the equation 



-c = a'^qa-q 



(C.12) 



in this case 



Q = 



‘11 



‘12 



‘12 



‘22 
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and since the choice of C Is arbitrary as long as It Is RSPDM 
choose C equal to a 2 x 2 Identity matrix. Then from 
equation (D.12) results 



-1 


0 




"0 


-b" 




^ 0 


- 1 _ 




_1 


-a. 





^11 ^12 



il2 ^ 22 ' 



'0 1 ■ 

-b -a 



^11 ^12 



^12 ^22 



whose solution Is 



(D.13) 



Qll = 1 + 



2b (1+b) 



(l-b)[(l+b)2 - a^] 



'12 



2ab 



(l-b)[(l+b)2 - a^] 






, = 2(l+b) 

(l-b)[(l+b)^ - a^] 



T 



Defining m = I I A Q B| | 



•^12 ^^ 22 - 



"0 


-b ■ 


"^11 


^12 


_1 


-a _ 


- ^12 


^22- 


■-b 


^22 


-| 
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then 



(ij = max(|b Q 22 l> ~ ^ ^ 22 !^ 

and substituting into equation (D.ll) 



= 1/2 since |u(n)| = |e(n)| ^ .5 
A min(C) = 1 



Q B = [0 1] PQh ^12 






^12 ^22 





1 — 

0 




1 

1 — 1 

1 



= q 



22 



the following state bound is obtained 



X* ( 



n) I < 1/2 



A max(Q) 
A min(Q) 



[w + -^ 



03 +q 



22 J 



(D.15) 



Comparing equation (D.15) with the one derived by S.R. 
Parker and Hess [1] for QAM, it can be concluded that the 
upper bound on amplitude of the limit cycles for quantization 
after addition is two times smaller than for quantization 
after multiplication. 
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